AITopics | maximum entropy monte-carlo planning

Maximum Entropy Monte-Carlo Planning

Neural Information Processing SystemsDec-25-2025, 15:35:33 GMT

We develop a new algorithm for online planning in large scale sequential decision problems that improves upon the worst case efficiency of UCT. The idea is to augment Monte-Carlo Tree Search (MCTS) with maximum entropy policy optimization, evaluating each search node by softmax values back-propagated from simulation. To establish the effectiveness of this approach, we first investigate the single-step decision problem, stochastic softmax bandits, and show that softmax values can be estimated at an optimal convergence rate in terms of mean squared error. We then extend this approach to general sequential decision making by developing a general MCTS algorithm, Maximum Entropy for Tree Search (MENTS). We prove that the probability of MENTS failing to identify the best decision at the root decays exponentially, which fundamentally improves the polynomial convergence rate of UCT. Our experimental results also demonstrate that MENTS is more sample efficient than UCT in both synthetic problems and Atari 2600 games.

electronic proceedings, maximum entropy monte-carlo planning, name change, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Reviews: Maximum Entropy Monte-Carlo Planning

Neural Information Processing SystemsJan-25-2025, 02:42:33 GMT

This paper proposes a new MCTS algorithm, Maximum Entropy for Tree Search (MENTS), which combines the maximum entropy policy optimization framework with MCTS for more efficient online planning in sequential decision problems. The main idea is to replace the Monte Carlo value estimate with the softmax value estimate as in the maximum entropy policy optimization framework, such that the state value can be estimated and back-propagated more efficiently in the search tree. Another main novelty is that it proposes an optimal algorithm, Empirical Exponential Weight (E2W), to be the tree policy to do more exploration. It shows that MENTS can achieve an exponential convergence rate towards finding the optimal action at the root of the tree, which is much faster than the polynomial convergence rate of the UCT method. The experimental results also demonstrate that MENTS performs significantly better than UCT in terms of sample efficiency, in both synthetic problems and Atari games.

algorithm, convergence rate, maximum entropy monte-carlo planning, (9 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.88)

Add feedback

Reviews: Maximum Entropy Monte-Carlo Planning

Neural Information Processing SystemsJan-25-2025, 02:42:22 GMT

This paper presents an appealing idea to combine current max-entropy methods in RL with Monte-Carlo Tree Search. A theoretical result shows improved rate of convergence, while empirical results show improved sample efficiency. The initial reviews were quite positive; I only noted a small number of issues mentioned in the reviews of R1 and R3. In our discussions after reading the author feedback, R3 noted that some of his concerns have not been addressed. R2 replied, saying that these concerns are relatively minor and can be addressed in the final version.

maximum entropy monte-carlo planning, result show

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.40)

Add feedback

Maximum Entropy Monte-Carlo Planning

Neural Information Processing SystemsOct-10-2024, 09:15:38 GMT

We develop a new algorithm for online planning in large scale sequential decision problems that improves upon the worst case efficiency of UCT. The idea is to augment Monte-Carlo Tree Search (MCTS) with maximum entropy policy optimization, evaluating each search node by softmax values back-propagated from simulation. To establish the effectiveness of this approach, we first investigate the single-step decision problem, stochastic softmax bandits, and show that softmax values can be estimated at an optimal convergence rate in terms of mean squared error. We then extend this approach to general sequential decision making by developing a general MCTS algorithm, Maximum Entropy for Tree Search (MENTS). We prove that the probability of MENTS failing to identify the best decision at the root decays exponentially, which fundamentally improves the polynomial convergence rate of UCT.

convergence rate, decision problem, maximum entropy monte-carlo planning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.91)

Add feedback

Maximum Entropy Monte-Carlo Planning

Xiao, Chenjun, Huang, Ruitong, Mei, Jincheng, Schuurmans, Dale, Müller, Martin

Neural Information Processing SystemsMar-19-2020, 00:30:53 GMT

We develop a new algorithm for online planning in large scale sequential decision problems that improves upon the worst case efficiency of UCT. The idea is to augment Monte-Carlo Tree Search (MCTS) with maximum entropy policy optimization, evaluating each search node by softmax values back-propagated from simulation. To establish the effectiveness of this approach, we first investigate the single-step decision problem, stochastic softmax bandits, and show that softmax values can be estimated at an optimal convergence rate in terms of mean squared error. We then extend this approach to general sequential decision making by developing a general MCTS algorithm, Maximum Entropy for Tree Search (MENTS). We prove that the probability of MENTS failing to identify the best decision at the root decays exponentially, which fundamentally improves the polynomial convergence rate of UCT.

convergence rate, decision problem, maximum entropy monte-carlo planning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.91)

Add feedback

Filters

Collaborating Authors

maximum entropy monte-carlo planning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Maximum Entropy Monte-Carlo Planning

Reviews: Maximum Entropy Monte-Carlo Planning

Reviews: Maximum Entropy Monte-Carlo Planning

Maximum Entropy Monte-Carlo Planning

Maximum Entropy Monte-Carlo Planning